Sensor based semantic object generation

ABSTRACT

Provided are methods, systems, and devices for generating semantic objects and an output based on the detection or recognition of the state of an environment that includes objects. State data, based in part on sensor output, can be received from one or more sensors that detect a state of an environment including objects. Based in part on the state data, semantic objects are generated. The semantic objects can correspond to the objects and include a set of attributes. Based in part on the set of attributes of the semantic objects, one or more operating modes, associated with the semantic objects can be determined. Based in part on the one or more operating modes, object outputs associated with the semantic objects can be generated. The object outputs can include one or more visual indications or one or more audio indications.

FIELD

The present disclosure relates generally to generating semantic objectsand an output based on the detection or recognition of the state of anenvironment that includes objects.

BACKGROUND

Object detection systems can capture a variety of information about theobjects in an environment, including, for example the appearance of anobject. Associating aspects of a detected object (e.g., the appearanceof the object) with another piece of information such as the identity ofthe object can be useful in various applications such as facialrecognition in which face detection and recognition can be used to gainaccess to a device based on whether the recognized face corresponds withan authorized user of the device. However, many existing objectdetection systems require a great deal of user input and interaction,which can be burdensome. Further, many of the existing object detectionsystems provide limited functionality or have functionality thatreceives scant use due to a cumbersome user interface. Accordingly, itwould be beneficial if there was a way to more effectively capture,process, and manipulate information associated with the state of anenvironment.

SUMMARY

Aspects and advantages of embodiments of the present disclosure will beset forth in part in the following description, or may be learned fromthe description, or may be learned through practice of the embodiments.

One example aspect of the present disclosure is directed to a method forgenerating semantic objects and an output based on the detection orrecognition of the state of an environment that includes objects. Themethod can include receiving, by a computing system comprising one ormore computing devices, state data based in part on sensor output fromone or more sensors that detect a state of an environment including oneor more objects. The method can also include, generating, by thecomputing system, based in part on the state data, one or more semanticobjects corresponding to the one or more objects. The one or moresemantic objects can comprise a set of attributes. The method caninclude, determining, by the computing system, based in part on the setof attributes of the one or more semantic objects, one or more operatingmodes associated with the one or more semantic objects. Further, themethod can include, generating, by the computing system, based in parton the one or more operating modes, one or more object outputsassociated with the one or more semantic objects. The one or more objectoutputs can comprise one or more visual indications or one or more audioindications.

Another example aspect of the present disclosure is directed to one ormore tangible, non-transitory computer-readable media storingcomputer-readable instructions that when executed by one or moreprocessors cause the one or more processors to perform operations. Theoperations can include receiving state data based in part on sensoroutput from one or more sensors that detect a state of an environmentincluding one or more objects. The operations can also include,generating, based in part on the state data, one or more semanticobjects corresponding to the one or more objects. The one or moresemantic objects can comprise a set of attributes. The operations caninclude, determining, based in part on the set of attributes of the oneor more semantic objects, one or more operating modes associated withthe one or more semantic objects. Further, the operations can include,generating, based in part on the one or more operating modes, one ormore object outputs associated with the one or more semantic objects.The one or more object outputs can comprise one or more visualindications or one or more audio indications.

Another example aspect of the present disclosure is directed to acomputing system comprising one or more processors, and one or morenon-transitory computer-readable media storing instructions that whenexecuted by the one or more processors cause the one or more processorsto perform operations. The operations can include receiving state databased in part on sensor output from one or more sensors that detect astate of an environment including one or more objects. The operationscan also include, generating, based in part on the state data, one ormore semantic objects corresponding to the one or more objects. The oneor more semantic objects can comprise a set of attributes. Theoperations can include, determining, based in part on the set ofattributes of the one or more semantic objects, one or more operatingmodes associated with the one or more semantic objects. Further, theoperations can include, generating, based in part on the one or moreoperating modes, one or more object outputs associated with the one ormore semantic objects. The one or more object outputs can comprise oneor more visual indications or one or more audio indications.

Other example aspects of the present disclosure are directed to othercomputer-implemented methods, systems, apparatus, tangible,non-transitory computer-readable media, user interfaces, memory devices,and electronic devices for generating semantic objects and an outputbased on the detection or recognition of the state of an environmentthat includes objects.

These and other features, aspects and advantages of various embodimentswill become better understood with reference to the followingdescription and appended claims. The accompanying drawings, which areincorporated in and constitute a part of this specification, illustrateembodiments of the present disclosure and, together with thedescription, serve to explain the related principles.

BRIEF DESCRIPTION OF THE DRAWINGS

Detailed discussion of embodiments directed to one of ordinary skill inthe art are set forth in the specification, which makes reference to theappended figures, in which:

FIG. 1 depicts a diagram of an example system according to exampleembodiments of the present disclosure;

FIG. 2 depicts a diagram of an example device according to exampleembodiments of the present disclosure;

FIG. 3 depicts an example of sensor based semantic object generationincluding image capture according to example embodiments of the presentdisclosure;

FIG. 4 depicts an example of sensor based semantic object generationincluding audio generation according to example embodiments of thepresent disclosure;

FIG. 5 depicts an example of sensor based semantic object generationincluding text translation according to example embodiments of thepresent disclosure;

FIG. 6 depicts an example of sensor based semantic object generationincluding text recognition according to example embodiments of thepresent disclosure;

FIG. 7 depicts an example of sensor based semantic object generationincluding text recognition according to example embodiments of thepresent disclosure;

FIG. 8 depicts an example of sensor based semantic object generationincluding object recognition according to example embodiments of thepresent disclosure;

FIG. 9 depicts an example of sensor based semantic object generationincluding object recognition according to example embodiments of thepresent disclosure;

FIG. 10 depicts an example of sensor based semantic object generationincluding location identification according to example embodiments ofthe present disclosure;

FIG. 11 depicts an example of sensor based semantic object generationincluding location identification according to example embodiments ofthe present disclosure;

FIG. 12 depicts an example of sensor based semantic object generationincluding navigation according to example embodiments of the presentdisclosure;

FIG. 13 depicts an example of an interface element of a sensor basedsemantic object generation including location identification accordingto example embodiments of the present disclosure;

FIG. 14 depicts a flow diagram of sensor based semantic objectgeneration according to example embodiments of the present disclosure;

FIG. 15 depicts a flow diagram of sensor based semantic objectgeneration according to example embodiments of the present disclosure;

FIG. 16 depicts a flow diagram of sensor based semantic objectgeneration according to example embodiments of the present disclosure;and

FIG. 17 depicts a flow diagram of sensor based semantic objectgeneration according to example embodiments of the present disclosure.

DETAILED DESCRIPTION

Example aspects of the present disclosure are directed to detecting,recognizing, and/or identifying objects in an environment, generatingsemantic objects (e.g., a data structure that is stored in a storagedevice and that includes one or more attributes associated with one ormore objects) based on the objects, and generating an output (e.g.,visual indications and/or audio indications) based on the semanticobjects. The disclosed technology can receive state data that isassociated with the state of an environment (e.g., an outdoor area or anindoor area) and objects in the environment (e.g., buildings, people,vehicles, consumer goods, and/or written materials), generate one ormore semantic objects that correspond to the one or more objects (e.g.,a handbag semantic object for a physical handbag), determine one or moreoperating modes associated with the one or more semantic objects (i.e.,determine how to process the one or more objects), and generate one ormore object outputs that can include one or more visual indications(e.g., one or more images including textual information associated withthe one or more objects) or one or more audio indications (e.g., one ormore sounds associated with the one or more objects).

As such, the disclosed technology can more effectively recognize objectsin an environment and perform various functions based on those objectsin a way that is unobtrusive and can in some situations require aminimal level of user input. Further, in some embodiments, by generatingone or more semantic objects based on persistent collection of sensoroutput from real-world objects, the disclosed technology is able tohighlight areas of interest that might otherwise go unnoticed. Further,by determining an operational mode to use in gathering and processingsensor inputs, the disclosed technology is able to conservecomputational resources and provide information that is more relevant toa user's needs.

By way of example, the disclosed technology can include a computingdevice that is carried by a user in an environment (e.g., an urbanenvironment) that includes a variety of objects. As the user walksthrough the environment, the user can hold the computing device in theirhand. The computing device can include a camera (e.g., a periscopiccamera) that is positioned on a portion of the computing device (e.g.,the top edge of the computing device) so that when the longest side ofthe device is held perpendicular to the user and/or parallel to theground, the camera can capture one or more images without the userhaving to aim the camera at objects in the environment. In particular, acamera can be positioned at the top edge of the computing device so thatwhen the computing device is held in a comfortable position for the user(e.g., with the longest side of the device held perpendicular to theuser and/or parallel to the ground) the camera has a field of view thatis generally in a same direction as the user's vision (e.g., the view infront of the user in the direction the user is facing).

As the user walks through the environment, an electronic device (e.g., atelevision set) in a store display window can capture the user'sinterest, and the user can approach the store display window, the cameracan capture images of the electronic device, and the computing devicecan generate a semantic object that is associated with the electronicdevice. The semantic object associated with the object such as, forexample, the electronic device, can include one or more attributesincluding its type (e.g., television set), size (e.g., screen size ofsixty-five inches), make (e.g., the make of the television setmanufacturer), and model (e.g., a model number associated with thetelevision set).

Based on the semantic object, the computing device can determine anoperating mode to use on the semantic object. The operating mode canindicate a type of processing that the computing device and/orassociated computing systems will perform on the semantic object. Forexample, the computing device can use a text recognition mode when textis detected in an object. In this example, the computing device candetermine that the object is merchandise and can access one or moreremote data sources and generate queries (e.g., perform a search throughan Internet search engine) based on the attributes of the semanticobject associated with the object.

The disclosed technology can then provide the user with an output thatincludes information about the electronic device itself as well as, forexample, other stores where the electronic device could be purchased,product ratings associated with the electronic device, and links towebsites that offer more information about the electronic device. Inthis way, the computing device can perform semantic lifting, includingsensor-based semantic object generation, to more efficiently processsensor outputs and provide users with the greater convenience thatresult from the computing device performing tasks that would otherwisebe performed by a user.

In some embodiments, the disclosed technology can include a computingsystem (e.g., a semantic processing system) that can include one or morecomputing devices (e.g., devices with one or more computer processorsand a memory that can store one or more instructions) that can exchange(send and/or receive), process, generate, and/or modify: data includingone or more information patterns or structures that can be stored on oneor more memory devices (e.g., random access memory) and/or storagedevices (e.g., a hard disk drive and/or a solid state drive); and/or oneor more signals (e.g., electronic signals). The data and/or one or moresignals can be exchanged by the computing system with various otherdevices including remote computing devices that can provide dataassociated with, or including, semantic type data associated with thevarious attributes of objects (e.g., the price of an item ofmerchandise); and/or one or more sensor devices that can provide sensoroutput for a geographical area (e.g., camera images from an Internetaccessible camera device) that can be used to determine the state of anenvironment that includes one or more objects.

In some embodiments, the semantic processing system can include adisplay component (e.g., a liquid crystal display (LCD), an organiclight emitting diode (OLED), plasma display panel, electronic ink,and/or a cathode ray tube) that is configured to display one or moreimages that can include images of an environment that includes one ormore objects that are detected by one or more sensors.

The semantic processing system can receive data, including for example,state data that is based in part on sensor output from one or moresensors that detect a state of an environment including one or moreobjects including physical objects (e.g., buildings, books, and/orbaggage). The state data can include information associated with stateof the environment and the one or more objects in the environmentincluding the location of the one or more objects, the time of day thatthe sensor output from the one or more objects is captured, and/or oneor more physical characteristics of the objects in the environment(e.g., size, appearance and/or one or more sounds produced by the one ormore objects).

In some embodiments, the one or more sensors can include one or moreoptical sensors (e.g., one or more cameras); one or more periscopicincluding one or more cameras that have a field of view that exceedsone-hundred and eighty degrees; one or more audio sensors (e.g., one ormore microphones); one or more tactile sensors; one or more barometricsensors; one or more gyroscopic sensors; one or more accelerometersincluding a configuration in which the one or more accelerometers candetermine acceleration along three axes (e.g., x axis, y axis, and zaxis); one or more humidity sensors including one or more sensors thatcan detect the level of moisture in the air; one or more electromagneticsensors; and/or one or more thermal sensors.

Further, the one or more periscopic cameras can be configured orpositioned to capture the one or more images including the one or moreobjects or portions of the one or more objects that are not within avisual plane of the display component. The display component of thesemantic computing system can include a visual plane which can include aplane that if it were an optical sensor would capture images within arange of less than one hundred and eighty degrees of the center of theoptical sensor (e.g., images perpendicular to the visual plane would notbe captured). For example, if the semantic processing device is in theshape of a rectangular cuboid, the display component (e.g., an LCDscreen) can be located on one or both of the two sides of the cuboidwith the greatest surface area and the one or more periscopic camerascan be located on one or more of the four sides of the cuboid that donot have the greatest surface area.

Further, the semantic processing system can operate on a continuousbasis so that detection, identification, and/or recognition of theenvironment including one or more objects in the environment can beperformed on an ongoing basis without input or instruction from a user.The semantic processing system can also provide indications of the oneor more objects that are recognized, or of an operating mode (e.g.,pathfinding mode, translation mode, and/or object detection mode) aspart of an interface (e.g., a graphical user interface that includes astatus bar).

In addition, in some embodiments, the recognition of the one or moreobjects can be performed as a continuous process as a backgroundoperation (e.g., on a background thread). Thus, in some embodiments, thesemantic processing system can continuously operate in the background torecognize objects within the environment based on sensor data indicativeof the environment. In some embodiments, such background operation caninclude operating to recognize objects even when a camera application isnot being executed by the system (e.g., operating in the background evenwhen the user is not operating the camera of the system). The user canbe provided with controls to control when the semantic processing systemoperates to recognize objects and when and what type of data iscollected for use by the semantic processing system.

The one or more sensors can be configured to detect the state (e.g., aphysical state) of the environment including one or more properties orcharacteristics of the one or more objects. Further, the semanticprocessing system can access a chronometer (e.g., a locally basedchronometer or a chronometer at a remote location) that can be used todetermine a time of day and/or a duration of one or more eventsincluding local events (e.g., events that are detectable by the one ormore sensors) and non-local events (e.g., events that occur in alocation that is not detectable by the one or more sensors). The one ormore properties or characteristics of the environment can include a timeof day and/or a geographic location (e.g., a latitude and longitudeassociated with the environment). The one or more properties orcharacteristics of the one or more objects can include size (e.g., aheight, length, and/or width), mass, weight, volume, color, and/or soundassociated with the one or more objects).

The semantic processing system can generate, for example based in parton the state data and an object recognition model including a machinelearned model, one or more semantic objects corresponding to the one ormore objects. The semantic processing system can access a machinelearned model (e.g., access a machine learned model that has been storedlocally and/or a machine learned model that is stored on a remotecomputing device) that has been created using a classification datasetincluding classifier data that includes a set of classified features anda set of classified object labels associated with training data that canbe based on, or associated with, a plurality of training objects (e.g.,physical objects or simulated objects that are used as training inputsfor the machine learned model). The classification dataset can be basedin part on inputs from one or more sensors (e.g., cameras and/ormicrophones) that have been used to generate visual outputs and audiooutputs based on the visual inputs and the audio inputs respectively.For example, the machine learned model can be created using a set ofcameras and microphones that captured training data including video andaudio of an urban area that includes various objects includingbuildings, streets, vehicles, people, and/or surfaces with text.

In some embodiments, the machine learned model can be based in part onone or more classification techniques comprising linear regression,logistic regression, random forest classification, boosted forestclassification, gradient boosting, a neural network, a support vectormachine, or a decision tree. Further, the semantic processing system canuse various object recognition models or techniques, to generate and/orprocess the one or more semantic objects, either in combination with themachine learned model or without the machine learned model. For example,the object recognition techniques can receive sensor data associatedwith one or more sensor outputs and can include one or more geneticalgorithms, edge matching, greyscale matching, gradient matching, and/orpose clustering.

The one or more semantic objects can include a set of attributes (e.g.,a set of attributes for each of the one or more semantic objects). Forexample, the set of attributes associated with the one or more semanticobjects can include one or more object identities including the identityof the one or more objects associated with the one or more semanticobjects (e.g., the designer and style of an article of clothing); one ormore object types associated with the type, category, or class of theone or more objects associated with the one or more semantic objects(e.g., a pair of trousers or dress shirt can be associated with aclothing type); an object location including a geographic locationassociated with the one or more objects associated with the one or moresemantic objects (e.g., an address of a building object); a monetaryvalue (e.g., one or more prices associated with an object); an ownershipstatus including the owner of an object (e.g., the owner of realproperty); and/or a set of physical characteristics (e.g., a size ormass associated with an object).

The semantic processing system can determine, based in part on the setof attributes of the one or more semantic objects, one or more operatingmodes associated with the one or more semantic objects. The one or moreoperating modes can determine the way in which the one or more semanticobjects are processed and/or used by the semantic processing system. Assuch, the semantic processing system can selectively dedicate computingresources to a subset of possible operations based on the one or moreattributes of the one or more semantic objects (e.g., detecting a posterthat includes text can result in a determination that a text recognitionmode will be used to process the one or more semantic objects associatedwith the poster).

The one or more operating modes can include a text recognition modeassociated with recognizing textual information in the environment(e.g., recognizing when an object contains text); a location recognitionmode associated with recognizing one or more locations in theenvironment (e.g., locating an entrance to a store); an objectrecognition mode associated with recognizing the one or more objects inthe environment (e.g., recognizing an article of merchandise); and/or anevent recognition mode associated with recognizing an occurrence of oneor more events in the environment.

The semantic processing system can generate, based in part on the one ormore operating modes, one or more object outputs associated with the oneor more semantic objects. The one or more object outputs can include oneor more outputs via one or more output devices of the semanticprocessing system (e.g., one or more display devices, audio devices,and/or haptic output devices). The text recognition mode can produce oneor more object outputs that include text related output includingtranslations of text that is recognized (e.g., generating English textbased on detection and translation of a Chinese text).

In some embodiments, the one or more object outputs can include one ormore visual indications (e.g., one or more visual images produced by adisplay device of the semantic processing system) and/or one or moreaudio indications (e.g., one or more sounds produced by an audio outputdevice of the semantic processing system). For example, the one or moreobject outputs can include a translation displayed on a display device,audio indications that include an audio version of a written text (e.g.,text to speech), and/or one or more images that are superimposed oncamera imagery of an environment.

The semantic processing system can determine, based in part on the setof attributes of the one or more semantic objects, object data thatmatches the one or more semantic objects. For example, the semanticprocessing system can match the set of attributes to the object databased on one or more comparisons between portions of the set ofattributes and the object data. The object data can include informationassociated with one or more related objects (e.g., a semantic object fora ring can be associated with other articles of jewelry); one or moreremote data sources (e.g., a semantic object for a book can beassociated with a website associated with the author of the book); oneor more locations; and/or one or more events.

The semantic processing system can access one or more portions of theobject data that match the one or more semantic objects. For example,the semantic processing system can access one or more portions of theobject data that are stored on one or more remote computing devices. Insome embodiments, the one or more object outputs can be based in part onthe one or more portions of the object data that match the one or moresemantic objects. For example, when the object data includes links toone or more remote computing devices that are associated with the one ormore semantic objects, the one or more object outputs can include thoselinks.

The semantic processing system can generate, based in part on the statedata or the one or more semantic objects, one or more interface elementsassociated with the one or more objects. The one or more interfaceelements can include one or more images (e.g., graphical user interfaceelements including pictograms and/or text) responsive to one or moreinputs (e.g., the one or more interface elements can initiate or triggerone or more operations based on a haptic input and/or an audio input).For example, the one or more interface elements can include a statusindicator (e.g., a status bar) that can provide a continuous indicationof the status of the one or more objects. In some embodimentsrecognition of the one or more objects can be performed as a continuousprocess (e.g., continuous recognition of the one or more objects) sothat the one or more objects (e.g., sensor output including visualand/or audio sensor output associated with the one or more objects that)can be detected, identified, and/or recognized in real time and the oneor more interface elements including the status indicator can also beupdated continuously (e.g., as the one or more objects are recognized inreal time). Further, the one or more interface elements can be used toprovide navigational instructions (e.g., textual or audio instructionsassociated with a path to a location) and other information related tothe one or more objects in the environment.

Thus, in some embodiments, the semantic processing system cancontinuously operate in the background to recognize objects. Uponrecognizing one or more objects, the semantic processing system canprovide a status indicator in a status bar of the user interface. Thestatus indicator can indicate that an object has been recognized and, insome embodiments, can further indicate the type of object that has beenrecognized. The status indicator in the status bar can provide anon-intrusive visual indication that additional semantic information foran object is available. If interested in receiving the additionalsemantic information, the user can interact with the status indicator(e.g., by tapping or dragging down) and the additional information(e.g., in the form of additional interface elements) can be displayedwithin the user interface.

In response to receiving one or more inputs to the one or more interfaceelements, the semantic processing system can determine one or moreremote computing devices that include at least a portion of the objectdata (e.g., one or more remote computing devices that store some part ofthe object data). The one or more object outputs can include one or moreremote source indications associated with the one or more remotecomputing devices that comprise at least a portion of the object data(e.g., IP addresses associated with the one or more remote computingdevices).

The semantic processing system can determine, based in part on the statedata or the one or more semantic objects, the one or more objects thatcomprise one or more semantic symbols (e.g., one or more graphemesincluding one or more letters, one or more logograms, one or moresyllabic characters and/or one or more pictograms). Based in part on theone or more semantic symbols, the semantic processing system candetermine one or more words associated with the one or more semanticsymbols (e.g., using dictionary data, certain combinations of the one ormore semantic symbols can be associated with words). In someembodiments, the set of attributes of the one or more semantic objectscan include the one or more words. For example, the semantic object fora poster with text indicating “Concert at 8:00 p.m. at the Civic center”can include a poster semantic object that includes a set of attributesthat includes concert as the value for an event type attribute, 8:00p.m. as the value for an event time attribute, and Civic center, or ageographic coordinate associated with the Civic center, as the value forthe location attribute.

The semantic processing system can determine a detected language that isassociated with the one or more semantic symbols. For example, based inpart on the combinations of the one or more semantic symbols (e.g.,words associated with the one or more semantic symbols), the semanticprocessing system can determine the language (e.g., a language includingEnglish, Russian, Chinese, and/or French) that is associated with theone or more semantic symbols.

The semantic processing system can generate, based in part ontranslation data, a translated output when the detected language is notassociated with a default language (e.g., a language that a user of thesemantic processing system has selected as being the language into whichthe detected language is translated when the detected language is notthe same as the default language). The translation data can include oneor more semantic symbols in the default language and one or moresemantic symbols in the detected language. The semantic processingsystem can compare the one or more semantic symbols in the detectedlanguage to the one or more semantic symbols in the default language todetermine and perform an analysis to translate the detected language.

The translated output can include the one or more semantic symbols inthe default language that correspond to a portion of the one or moresemantic symbols in the detected language (e.g., a multi-languagedictionary that includes a listing of one or more words in the defaultlanguage, each of which is associated with the corresponding word in thedetected language). In some embodiments, the one or more object outputscan be based in part on the translated output (e.g., the one or moreobject outputs can include a visual indication or an audio indication ofthe translation).

The semantic processing system can receive location data that includesinformation associated with a current location of the environment and adestination location (e.g., a destination location selected by a user ofthe semantic processing system). Further, the semantic processing systemcan determine, based in part on the location data and the state of theone or more objects within a field of view of the one or more sensors, apath from the current location to the destination location (e.g., a pathbetween the current location and the destination location that avoidsintervening obstacles).

Further, the semantic processing system can generate one or moredirections based in part on the one or more semantic objects and thepath from the current location to the destination location. Further, thesemantic processing system can determine one or more semantic objectsthat can be used as landmarks associated with the one or more directions(e.g., a semantic object associated with a lamppost can be used as partof the one or more directions “turn left at the lamp post in front ofyou”). In some embodiments, the one or more object outputs can be basedin part on the one or more directions (e.g., the one or more visualindications or the one or more audio indications can includedirections).

In some embodiments, the semantic processing system can determine one ormore relevance values corresponding to the one or more semantic objects.The one or more relevance values can be based in part on an extent towhich each of the one or more semantic objects is associated withcontext data. The context data can include various characteristicsassociated with the environment including data associated with a time ofday, a current location (e.g., a latitude and longitude associated withthe environment); one or more scheduled events (e.g., one or more eventsthat will occur within a predetermined period of time), one or more userlocations, or one or more user preferences (e.g., one or morepreferences of a user including food preferences, musical preferences,and/or entertainment preferences). In some embodiments, the one or moreobject outputs can be based in part on the one or more relevance valuesthat correspond to the one or more semantic objects.

The semantic processing system can modify, based in part on the statedata or the semantic data, the one or more visual indications or the oneor more audio indications. Modifying the one or more visual indicationsor the one or more audio indications can include transforming the one ormore visual indications into one or more modified audio indications(e.g., generating artificial speech based on text); transforming the oneor more audio indications into one or more modified visual indications(e.g., generating text based on audio inputs to a microphone); modifyinga size of the one or more visual indications (e.g., increasing the sizeof text captured by a camera); modifying one or more colorcharacteristics of the one or more visual indications (e.g., generatinga highlight around the one or more visual indications); and/or modifyingan amplitude of the one or more audio indications (e.g., increasing thevolume of one or more audio indications). Such modifications of the oneor more visual indications and/or the one or more audio indications canbe used to enhance any user's experience and can be particularly usefulfor individuals with visual or hearing impairments. For example, thesemantic processing system can enhance the size and clarity of text thatwould be otherwise unreadable for an individual with a visualimpairment.

One example aspect of the present disclosure is directed to a mobiledevice that includes a display. In some embodiments, a plane of thedisplay can define a first plane of the mobile device. The mobile devicecan include a camera arranged to capture one or more images from adirection parallel to the first plane of the mobile device. The mobiledevice can include a processor configured to receive an image capturedby the camera, recognize one or more objects present in the receivedimage, and control an output of the display based on one or morerecognized objects in the received image.

In some embodiments, the processor is configured to control the displayto output a user-interface element in response to one or more recognizedobjects. The user-interface element can be displayed over one or moreuser-interface elements already being displayed by the display. Theuser-interface element output, in response to one or more recognizedobjects, can comprise a bar element displayed at a top end of thedisplay when the output of the display has a portrait orientation. Insome embodiments, the processor is configured to recognize a hazard, andthe output user-interface element comprises a warning message. In someembodiments, the processor is further configured to determine a locationof the mobile device, based on one or more objects recognized in thereceived image, and control the output of the display based on thedetermined location of the mobile device.

In some embodiments, the display is a rectangular shape, and the camerais arranged to capture one or more images from a direction which isparallel to a long axis of the display. The camera can be configured tocapture a plurality of images sequentially at a preset interval, and theprocessor can be configured to receive each of the plurality of imagescaptured by the camera.

In some embodiments, the camera can be configured to capture theplurality of images according to whether or not the display of themobile device is active. The mobile device can comprise a characterrecognition unit. The character recognition unit can be configured toreceive a text object recognized in the received image from theprocessor; determine a text string from the received text object; and/orsend the determined text string to the processor. Further, the processorcan be configured to control the output of the display based on thedetermined text string.

In some embodiments, the mobile device can include a language unit. Thelanguage unit can be configured to receive the text string determined bythe character recognition unit from the processor, convert the textstring to a translated text string in a second language, and/or send thetranslated text string to the processor. The processor can be configuredto control the output of the display based on the translated textstring.

In some embodiments, the mobile device can include an audio output unit.The processor can be configured to control an output of the audio outputunit based on one or more recognized objects in the received image.

Another example aspect of the present disclosure is directed to a methodof operating a mobile device. The method can include receiving an imagecaptured by a camera of the mobile device in which the camera isarranged to capture one or more images from a direction which isparallel to a first plane of the mobile device, as defined by a plane ofa display of the mobile device; recognizing one or more objects presentin the received image; and/or controlling an output of the display ofthe mobile device based on one or more recognized objects in thereceived image.

In some embodiments, receiving the image can include receiving aplurality of images captured sequentially by the camera at a presetinterval. In some embodiments, receiving the plurality of images caninclude receiving the plurality of images captured by the cameraaccording to whether or not the display of the mobile device is active.The method can include controlling the display to output auser-interface element in response to one or more recognized objects.The user-interface element can be displayed over one or moreuser-interface elements already being displayed by the display. In someembodiments, the user-interface element output in response to one ormore recognized objects can comprise a bar element displayed at a topend of the display when the output of the display has a portraitorientation. Recognizing one or more objects can comprise recognizing ahazard, and the output user-interface element can comprise a warningmessage.

In some embodiments, the method can include determining a location ofthe mobile device based on one or more objects recognized in thereceived image, and controlling the output of the display based on thedetermined location of the mobile device. The method can includerecognizing a text object in the received image from the processor;determining a text string from the recognized text object; and/orcontrolling the output of the display based on the determined textstring.

In some embodiments, the method can include converting the determinedtext string to a translated text string in a second language andcontrolling the output of the display based on the translated textstring. In some embodiments, the method can include controlling anoutput of the audio output unit based on one or more recognized objectsin the received image.

Another example aspect of the present disclosure is directed to acomputer-readable medium comprising a program which, when executed by aprocessor, performs a method of operating a mobile device. The methodperformed by the program can include receiving an image captured by acamera of the mobile device in which the camera is arranged to captureone or more images from a direction which is parallel to a first planeof the mobile device, as defined by a plane of a display of the mobiledevice; recognizing one or more objects present in the received image;and/or controlling an output of the display of the mobile device basedon one or more recognized objects in the received image.

In some embodiments, receiving the image can include receiving aplurality of images captured sequentially by the camera at a presetinterval. In some embodiments, receiving the plurality of images caninclude receiving the plurality of images captured by the cameraaccording to whether or not the display of the mobile device is active.In some embodiments, the method performed by the program can includecontrolling the display to output a user-interface element in responseto one or more recognized objects. The user-interface element can bedisplayed over one or more user-interface elements already beingdisplayed by the display. In some embodiments, the user-interfaceelement output in response to one or more recognized objects cancomprise a bar element displayed at a top end of the display when theoutput of the display has a portrait orientation.

In some embodiments, recognizing one or more objects can compriserecognizing a hazard, and the output user-interface element can comprisea warning message. In some embodiments, the method performed by theprogram can include determining a location of the mobile device, basedon one or more objects recognized in the received image, and controllingthe output of the display based on the determined location of the mobiledevice. In some embodiments, the method performed by the program caninclude recognizing a text object in the received image from theprocessor, determining a text string from the recognized text object,and/or controlling the output of the display based on the determinedtext string.

In some embodiments, the method performed by the program can includeconverting the determined text string to a translated text string in asecond language, and/or controlling the output of the display based onthe translated text string. In some embodiments, the method performed bythe program can include controlling an output of the audio output unitbased on one or more recognized objects in the received image.

The systems, methods, devices, and non-transitory computer-readablemedia in the disclosed technology can provide a variety of technicaleffects and benefits to the overall process of recognizing anenvironment based on sensor outputs from one or more sensors, generatingone or more semantic objects based on the sensor outputs, and performingone or more actions based on the one or more semantic objects. Thedisclosed technology can reduce or eliminate the need for a user toengage in manual interaction to gather information about theirenvironment and the objects in that environment. The reductions inmanual interaction can result from automated processing of sensor datathat can persistently monitor the state of the environment, determine anoptimal operational mode, and generate indications in a more efficientmanner (e.g., using fewer steps to produce an output). In situations inwhich manual selection is still used, the disclosed technology canreduce the amount of human intervention by performing commonly usedfunctions including translation, image recognition, and association ofsemantic data with external data sources more rapidly than without theassistance of the disclosed technology (e.g., by eliminating one or moresteps performed in the different functions).

By changing operating mode based on conditions in the environment, thedisclosed technology can maximize the use of computing resources byselectively activating sensors and selectively performing variousoperations. For example, by determining an operating mode to use and oneor more specific actions to perform (e.g., text translation), thedisclosed technology can avoid the excessive resource usage (e.g.,battery power and/or network transmissions) that can result from a morehaphazard approach that does not include generation and analysis ofsemantic objects associated with an environment. Additionally, thedisclosed technology can leverage the power of a machine learned model,including a locally stored machine learned model that can be accessedwithout the need to use network resources (e.g., network bandwidth tocontact a machine learned model that is stored on a remote computingdevice).

In this way, the disclosed technology is able to reduce or otherwiseimprove the efficiency of a user's interaction with a device. Bychanging operating mode and/or performing one or more actions based onthe environment and one or more semantic objects associated with theenvironment, without the intervention of a user, the disclosedtechnology can lead a user to a desired information result or action ina shorter amount of time, or with fewer interaction steps. Hence,particularly in the field of mobile devices, the disclosed technologycan lead to a reduction in the power consumption demands associated witha screen-on time and with processor usage, these power consumptiondemands can be of particular importance in a mobile device. Thedisclosed technology can reduce the demands for processing timeassociated with processing a user input query, and processing a responseto such a query. By increasing the number of instances in which a usercan be provided with a desired information result or action, withoutprocessing and responding to a user input query, the disclosedtechnology over time can result in significant consumption of power andprocessing resources over time. By extension, by reducing the number ofinstances in which a query must be sent to a remote computing device,the disclosed technology can provide efficiencies in network usageacross a system of mobile devices which implement the disclosedtechnology.

The disclosed technology also offers the benefits of being able to beconfigured with various sensors (e.g., a periscopic camera) positionedin a way that is more ergonomic for a user (e.g., more ergonomic for auser to hold) and that capture a wider field of view of the environmentsurrounding the user. Sensors, such as a periscope camera, may bepositioned on a device in a way that improves the passive collection ofsensor data from the environment, based on a normal or natural holdingconfiguration of the device, such that the sensors can persistentlymonitor the state of the environment without an active gesture or actionby a user of the device. Further, the disclosed technology can usesemantic objects based on data captured from local sensors to enrichdirections in pathfinding applications which can be displayed in one ormore interface elements (e.g., a status bar indicator that includes apathfinding indicator to indicate pathfinding is being performed and/oran object recognition indicator to indicate that object recognition isbeing performed)). For example, the disclosed technology can use locallandmarks or other objects within view of a camera on the device as cuesto enhance directions.

Accordingly, the disclosed technology provides more effective sensorbased semantic object generation in a variety of environments along withthe added benefits of lower resource usage (e.g., improved utilizationof battery and network resources) that result from a semantic objectdriven approach to gathering and processing the state of theenvironment.

Reference now will be made in detail to embodiments, one or moreexamples of which are illustrated in the drawings. Each example isprovided by way of explanation of the embodiments, not limitation of thepresent disclosure. In fact, it will be apparent to those skilled in theart that various modifications and variations can be made to theembodiments without departing from the scope or spirit of the presentdisclosure. For instance, features illustrated or described as part ofone embodiment can be used with another embodiment to yield a stillfurther embodiment. Thus, it is intended that aspects of the presentdisclosure cover such modifications and variations.

With reference now to the FIGS. 1-17, example aspects of the presentdisclosure will be disclosed in greater detail. FIG. 1 depicts a diagramof an example system 100 according to example embodiments of the presentdisclosure. The system 100 can include a user device 102; a remotecomputing device 104; a communication network 106; an object recognitioncomponent 110; object data 114 (e.g., data associated with one or morephysical objects and/or one or more semantic objects); and a geographicinformation system 120.

The user device 102 can receive object data (e.g., informationassociated with one or more objects detected or recognized by the userdevice 102) from the remote computing device 104 via a communicationnetwork 106. The object recognition component 110, which can operate orbe executed on the user device 102, can interact with the remotecomputing device 104 via the network 106 to perform one or moreoperations including detection and/or recognition of one or moreobjects; generation of one or more semantic objects; and/or generationof one or more outputs (e.g., physical outputs including visualindications, audio indications, and/or haptic indications). In someembodiments, the object recognition component 110 can include a machinelearned model that can be used to detect and/or recognize objects andwhich can also be used in the generation of one or more semanticobjects. The network 106 can include any type of communications network,such as a local area network (e.g. intranet), wide area network (e.g.Internet), cellular network, or some combination thereof. The network106 can also include a direct connection. In general, communication canbe carried via network 106 using any type of wired and/or wirelessconnection, using a variety of communication protocols (e.g. TCP/IP,HTTP, SMTP, FTP), encodings or formats (e.g. HTML or XML), and/orprotection schemes (e.g. VPN, secure HTTP, or SSL).

The user device 102 can include one or more computing devices includinga tablet computing device, a device that is able to be worn (e.g., asmart watch or a smart band), a laptop computing device, a desktopcomputing device, a mobile computing device (e.g., a smartphone), and/ora display device with one or more processors.

The object recognition component 110 can be implemented on the userdevice 102. The object recognition component 110 can implement objectdetection and/or recognition of one or more objects. Further, the objectrecognition component 110 can assist in the generation of one or moresemantic objects based on one or more sensory outputs from one or moresensors (not shown). The sensory outputs can be associated with one ormore images or sounds associated with one or more objects in anenvironment. The object recognition component 110 can be operated orexecuted locally on the user device 102, through a web applicationaccessed via a web browser implemented on the user device 102, orthrough a combination of local execution or operation on user device 102and remote execution or operation on a remote computing device which caninclude the remote computing device 104 or the geographic informationsystem 120.

The object recognition component 110 can be configured to generate,process, or modify data including image data (e.g., image files), audiodata (e.g., sound files), and/or navigational data (e.g., the locationof places of interest associated with the image data) that can be usedby a user.

In some embodiments, the remote computing device 104 can include one ormore computing devices including servers (e.g., web servers). The one ormore computing devices can include one or more processors and one ormore memory devices. The one or more memory devices can storecomputer-readable instruction to implement, for example, one or moreapplications that are associated with the object data 114. In someembodiments, the object data 114 can be associated, for instance, with ageographic information system 120.

The geographic information system 120 can be associated with or includedata that is indexed according to geographic coordinates (e.g., latitudeand longitude) of its constituent elements (e.g., locations). The dataassociated with the geographic information system 120 can include mapdata, image data, geographic imagery, and/or data associated withvarious waypoints (e.g., addresses or geographic coordinates). Theobject data 114 as determined or generated by the remote computingdevice 104 can include data associated with the state or characteristicsof one or more objects and/or one or more semantic objects including forexample, object identifiers (e.g., location names and/or names ofobjects), prices of objects, locations of objects, and/or ownership ofobjects.

FIG. 2 depicts an example computing device 200 that can be configured togenerate semantic objects and an output based on the detection orrecognition of the state of an environment that includes objectsaccording to example embodiments of the present disclosure. Thecomputing device 200 can include one or more portions of one or moresystems (e.g., one or more computing systems) or devices (e.g., one ormore computing devices) including the user device 102 and/or the remotecomputing device 104, which are shown in FIG. 1. As shown, the computingdevice 200 an include a memory 204; an object recognition component 212that can include one or more instructions that can be stored on thememory 204; one or more processors 220 configured to execute the one ormore instructions stored in the memory 204; a network interface 222 thatcan support network communications; one or more mass storage devices 224(e.g., a hard disk drive or a solid state drive); one or more outputdevices 226 (e.g., one or more display devices); a sensor array 228(e.g., one or more optical and/or audio sensors); one or more inputdevices 230 (e.g., one or more touch detection surfaces); and/or one ormore interconnects 232 (e.g., a bus used to transfer one or more signalsor data between computing components in a computing device). The one ormore processors 220 can include any processing device that can, forexample, process and/or exchange (send or receive) one or more signalsor data associated with a computing device.

For example, the one or more processors 220 can include single ormultiple core devices including a microprocessor, microcontroller,integrated circuit, and/or logic device. The memory 204 and the storagememory 224 are illustrated separately, however, the components 204 and224 can be regions within the same memory module. The computing device200 can include one or more additional processors, memory devices,network interfaces, which may be provided separately or on a same chipor board. The components 204 and 224 can include one or morecomputer-readable media, including, but not limited to, non-transitorycomputer-readable media, RAM, ROM, hard drives, flash drives, and/orother memory devices.

The memory 204 can store sets of instructions for applications includingan operating system that can be associated with various softwareapplications or data. The memory 204 can be used to operate variousapplications including a mobile operating system developed specificallyfor mobile devices. As such, the memory 204 can perform functions thatallow the software applications to access data including wirelessnetwork parameters (e.g., identity of the wireless network, quality ofservice), and invoke various services including telephony, locationdetermination (e.g., via global positioning service (GPS) or WLAN),and/or wireless network data call origination services. In otherimplementations, the memory 204 can be used to operate or execute ageneral-purpose operating system that operates on both mobile andstationary devices, such as smartphones and desktop computers, forexample. In some embodiments, the object recognition component 212 caninclude a machine learned model that can be used to detect and/orrecognize objects. Further, the object recognition component can be usedin the generation of one or more semantic objects.

The sensor array 228 can include one or more sensors that can detectchanges in the state of an environment that includes one or moreobjects. For example, the sensor array 228 can include one or moreoptical sensors, motion sensors, thermal sensors, audio sensors, hapticsensors, pressure sensors, humidity sensors, and/or electromagneticsensors. The one or more input devices 230 can include one or moredevices for entering input into the computing device 200 including oneor more touch sensitive surfaces (e.g., resistive and/or capacitivetouch screens), keyboards, mouse devices, microphones, and/or stylusdevices. The one or more output devices 226 can include one or moredevices that can provide a physical output including visual outputs,audio outputs, and/or haptic outputs. For example, the one or moreoutput devices 226 can include one or more display components (e.g., LCDmonitors, OLED monitors, and/or indicator lights), one or more audiocomponents (e.g., loud speakers), and/or one or more haptic outputdevices that can produce movements including vibrations.

The software applications that can be operated or executed by thecomputing device 200 can include the object recognition component 110shown in FIG. 1. Further, the software applications that can be operatedor executed by the computing device 200 can include native applicationsor web-based applications.

In some implementations, the user device can be associated with orinclude a positioning system (not shown). The positioning system caninclude one or more devices or circuitry for determining the position ofa device. For example, the positioning device can determine actual orrelative position by using a satellite navigation positioning system(e.g. a GPS system, a Galileo positioning system, the GLObal Navigationsatellite system (GLONASS), the BeiDou Satellite Navigation andPositioning system), an inertial navigation system, a dead reckoningsystem, based on IP address, by using triangulation and/or proximity tocellular towers or Wi-Fi hotspots, beacons, and the like and/or othersuitable techniques for determining position. The positioning system candetermine a user location of the user device. The user location can beprovided to the remote computing device 104 for use by the object dataprovider in determining travel data associated with the user device 102.

The one or more interconnects 232 can include one or more interconnectsor buses that can be used to exchange (e.g., send and/or receive) one ormore signals (e.g., electronic signals) and/or data between componentsof the computing device 200 including the memory 204, the objectrecognition component 212, the one or more processors 220, the networkinterface 222, the one or more mass storage devices 224, the one or moreoutput devices 226, the sensor array 228, and/or the one or more inputdevices 230. The one or more interconnects 232 can be arranged orconfigured in different ways including as parallel or serialconnections. Further the one or more interconnects 232 can include oneor more internal buses to connect the internal components of thecomputing device 200; and one or more external buses used to connect theinternal components of the computing device 200 to one or more externaldevices. By way of example, the one or more interconnects 232 caninclude different interfaces including Industry Standard Architecture(ISA), Extended ISA, Peripheral Components Interconnect (PCI), PCIExpress, Serial AT Attachment (SATA), HyperTransport (HT), USB(Universal Serial Bus), Thunderbolt, and/or IEEE 1394 interface(FireWire).

FIG. 3 depicts an example of sensor based semantic object generationincluding image capture according to example embodiments of the presentdisclosure. FIG. 3 includes an illustration of an environment 300, oneor more portions of which can be detected, recognized, and/or processedby one or more systems (e.g., one or more computing systems) or devices(e.g., one or more computing devices) including, the user device 102shown in FIG. 1, the remote computing device 104 shown in FIG. 1, and/orthe computing device 200 shown in FIG. 2. Further, the detection,recognition, and/or processing of one or more portions of theenvironment 300 can be implemented as an algorithm on the hardwarecomponents of one or more devices or systems (e.g., the user device 102,the remote computing device 104, and/or the computing device 200) to,for example, generate one or more semantic objects and output based onone or more objects. As shown in FIG. 3, the environment 300 includes asemantic processing system 310, a display component 312, an edge portion314, an object 320, and a text portion 322.

The display component 312 of the semantic processing system 310 candisplay one or more images of an environment, including the environment300. The one or more images displayed by the display component 312 canbe captured by one or more sensors (e.g., one or more cameras) of thesemantic processing system 310. In this example, the display component312 uses a camera (e.g., a periscopic camera) positioned on the edgeportion 314 of the semantic processing system 310 that captures an imageof an object 320, which is a poster with text in a combination oflanguages (English and Chinese). In some embodiments, the one or moresensors can be located anywhere on the semantic processing system 310.Further, the semantic processing system 310 can receive sensory outputsfrom one or more external devices (e.g., a remote camera can providevideo imagery to the semantic processing system 310).

The semantic processing system 310 can output one or more images of theobject 320, including the text portion 322, on the display component312. As illustrated in FIG. 3, the disclosed technology can outputimages of an environment onto a display component of a device that canreceive one or more inputs from a user.

FIG. 4 depicts an example of sensor based semantic object generationincluding audio generation according to example embodiments of thepresent disclosure. FIG. 4 includes an illustration of an environment400, one or more portions of which can be detected, recognized, and/orprocessed by one or more systems (e.g., one or more computing systems)or devices (e.g., one or more computing devices) including a semanticprocessing system audio component 410 that can include one or moreportions of the user device 102 shown in FIG. 1, the remote computingdevice 104 shown in FIG. 1, and/or the computing device 200 shown inFIG. 2. Further, the detection, recognition, and/or processing of one ormore portions of the environment 400 can be implemented as an algorithmon the hardware components of one or more devices or systems (e.g., theuser device 102, the remote computing device 104, and/or the computingdevice 200) to, for example, generate one or more semantic objects andoutput based on one or more objects. As shown in FIG. 4, the environment400 includes a semantic processing system audio output component 410.

The semantic processing system audio output component 410 can includeone or more components that can output sounds including outputtingsounds via one or more speakers of the semantic processing system audiooutput component 410. For example, the semantic processing system audiooutput component 410 can receive one or more signals (e.g., one or moresignals including data) from a system or device such as the user device102 or the computing device 200. The one or more signals can betransmitted wirelessly or via wire and received by a receiving component(not shown) of the semantic processing system audio output component410. The one or more signals can include data associated with one ormore indications about the state of an environment that includes one ormore objects. For example, the one or more signals can include audiothat is based on a portion of text that was recognized (e.g., text tospeech translation) or directions to a location (e.g., audioinstructions of directions to a destination location).

FIG. 5 depicts an example of sensor based semantic object generationincluding text translation according to example embodiments of thepresent disclosure. FIG. 5 includes an illustration of semanticprocessing system 500 that can include one or more systems (e.g., one ormore computing systems) or devices (e.g., one or more computing devices)including the user device 102 shown in FIG. 1, the remote computingdevice 104 shown in FIG. 1, and/or the computing device 200 shown inFIG. 2. Further, the detection, recognition, and/or processing of one ormore portions of an environment by the semantic processing system 500can be implemented as an algorithm on the hardware components of one ormore devices or systems (e.g., the user device 102, the remote computingdevice 104, and/or the computing device 200) to, for example, generateone or more semantic objects and output based on one or more objects. Asshown in FIG. 5, the semantic processing system 500 includes a displaycomponent 510 and a text portion 512.

The semantic processing system 500 can display one or more images of anenvironment that includes one or more objects on a display component510. The one or more images can be captured by one or more sensors (notshown) of the semantic processing system 500. In this example, thedisplay component 510 outputs a display of a poster with text in acombination of languages (English and Chinese). The semantic processingsystem 500 can generate a semantic object corresponding to the textdetected in the environment, translate the text, and output the textportion 512 that is shown on the display component 510. For example, thesemantic processing system 500 can superimpose a translated English text(“Qingdao Daily”) over Chinese text captured by the semantic processingsystem 500.

FIG. 6 depicts an example of sensor based semantic object generationincluding text recognition according to example embodiments of thepresent disclosure. FIG. 6 includes an illustration of an environment600, one or more portions of which can be detected, recognized, and/orprocessed by one or more systems (e.g., one or more computing systems)or devices (e.g., one or more computing devices) including a semanticprocessing system 610 that can include one or more portions of the userdevice 102 shown in FIG. 1, the remote computing device 104 shown inFIG. 1, and/or the computing device 200 shown in FIG. 2. Further, thedetection, recognition, and/or processing of one or more portions of theenvironment 600 can be implemented as an algorithm on the hardwarecomponents of one or more devices or systems (e.g., the user device 102,the remote computing device 104, and/or the computing device 200) to,for example, generate one or more semantic objects and output based onone or more objects. As shown in FIG. 6, the environment 600 includesthe semantic processing system 610, an object 620, and a text portion622.

The semantic processing system 610 can capture one or more images viaone or more sensors (e.g., one or more cameras). The semantic processingsystem 610 can include one or more periscopic cameras (not shown) thatcan be positioned on the semantic processing system 610 so that the widefield of view of the one or more periscopic cameras can capture thestate of the environment 600 including the object 620 (e.g., a poster)that includes a text portion 622 (“Juanita de Flor”). The positioning ofthe one or more periscopic cameras allows a user of the semanticprocessing system 610 to capture one or more images of one or moreobjects in an environment while holding the semantic processing system610 in an ergonomically comfortable position.

FIG. 7 depicts an example of sensor based semantic object generationincluding text recognition according to example embodiments of thepresent disclosure. FIG. 7 includes an illustration of a semanticprocessing system 700 that can include one or more portions of one ormore systems (e.g., one or more computing systems) or devices (e.g., oneor more computing devices) including, the user device 102 shown in FIG.1, the remote computing device 104 shown in FIG. 1, and/or the computingdevice 200 shown in FIG. 2. Further, the detection, recognition, and/orprocessing of one or more portions of an environment by the semanticprocessing system 700 can be implemented as an algorithm on the hardwarecomponents of one or more devices or systems (e.g., the user device 102,the remote computing device 104, and/or the computing device 200) to,for example, generate one or more semantic objects and output based onone or more objects. As shown in FIG. 7, the semantic processing system700 includes a display component 710, an image object 712, and aninterface element 714.

The semantic processing system 700 can display one or more images of anenvironment that includes one or more objects on the display component710. The one or more images displayed on the display component 710 canbe captured by one or more sensors (not shown) of the semanticprocessing system 700. In this example, the display component 710outputs the image object 712 that includes a visual representation of aportion of a poster with text (“Juanita de Flor”). The semanticprocessing system 700 can recognize that the object (e.g., the poster)associated with the image object 712 includes text and can generate asemantic object based on the image object 712 (e.g., a semantic objectbased on recognizing an object). Based on the semantic object, thesemantic processing system 700 can determine that the image object 712is associated with a musician, “Juanita de Flor,” and can access aremote computing device (e.g., the remote computing device 104) thatincludes data (e.g., a music audio file) associated with the semanticobject that was generated. Based on the identity of the semantic object(e.g., the musician's name), the semantic processing system 700 cangenerate one or more interface elements, including the interface element714, on the display component 710 that will allow a user to access orcontrol information related to the semantic object. For example, theinterface element 714 can be used to copy a music audio file associatedwith the semantic object generated by the semantic processing system700.

FIG. 8 depicts an example of sensor based semantic object generationincluding object recognition according to example embodiments of thepresent disclosure. FIG. 8 includes an illustration of an environment800, one or more portions of which can be detected, recognized, and/orprocessed by one or more systems (e.g., one or more computing systems)or devices (e.g., one or more computing devices) including a semanticprocessing system that can include one or more portions of the userdevice 102 shown in FIG. 1, the remote computing device 104 shown inFIG. 1, and/or the computing device 200 shown in FIG. 2. Further, thedetection, recognition, and/or processing of one or more portions of theenvironment 800 can be implemented as an algorithm on the hardwarecomponents of one or more devices or systems (e.g., the user device 102,the remote computing device 104, and/or the computing device 200) to,for example, generate one or more semantic objects and output based onone or more objects. As shown in FIG. 8, the environment 800 includes asemantic processing system 810, a display component 812, an object 820,and an object label 822.

The display component 812 of the semantic processing system 810 candisplay one or more images based on the environment 800. The one or moreimages displayed by the display component 812 can be captured by one ormore sensors (not shown) of the semantic processing system 810. Thesemantic processing system 810 can capture an image of the object 820,which is a handbag. The semantic processing system 810 can generate asemantic object, based on recognition by the semantic processing system810, that the object 820 is a handbag. The semantic processing system810 can detect the object label 822, and based on detecting the objectlabel 822, can generate one or more attributes of the semantic objectassociated with the object 820, including for example, an object brandattribute that can be assigned a value based on the brand of the object820 that is determined by the semantic processing system 810. Forexample, to determine the value of the object brand attribute, thesemantic processing system 810 can access a remote computing system thatcan include data associated with the object brand attribute, and can usethe data to associate a value (e.g., the brand of the handbag maker)with the object brand attribute.

FIG. 9 depicts an example of sensor based semantic object generationincluding object recognition according to example embodiments of thepresent disclosure. FIG. 9 includes an illustration of a semanticprocessing system 900 that can include one or more portions of one ormore systems (e.g., one or more computing systems) or devices (e.g., oneor more computing devices) including, the user device 102 shown in FIG.1, the remote computing device 104 shown in FIG. 1, and/or the computingdevice 200 shown in FIG. 2. Further, the detection, recognition, and/orprocessing of one or more portions of an environment by the semanticprocessing system 900 can be implemented as an algorithm on the hardwarecomponents of one or more devices or systems (e.g., the user device 102,the remote computing device 104, and/or the computing device 200) to,for example, generate one or more semantic objects and output based onone or more objects. As shown in FIG. 9, the semantic processing system900 includes a display component 910, an image object 920, an imageobject portion 922, an object identifier 924, and an interface element926.

The semantic processing system 900 can display one or more images of anenvironment (e.g., an environment including one or more objects) on adisplay component 910. The one or more images can be captured by one ormore sensors (e.g., one or more cameras) of the semantic processingsystem 900 which can be located on one or more portions of the semanticprocessing system 900. In this example, the display component 910outputs a display of an object 920. The semantic processing system 900can recognize that the object 920 is a handbag that includes an objectlabel 922. The semantic processing system 900 can generate a semanticobject attribute based on the object label 922. Based on the attributesof the semantic object (e.g., the object is a handbag with a label froma particular manufacturer), the semantic processing system 900 cangenerate display output including the object identifier 924 (“Bag”) andinterface elements, including the interface element 926. The interfaceelement 926 can be a control element that, upon activation by a user(e.g., touching the interface element 926 and/or issuing a voice commanddirected at the interface element 926), can perform one or more actionsincluding accessing an Internet web site that sells goods or servicesincluding the object 920 and/or providing more information about theobject 920.

FIG. 10 depicts an example of sensor based semantic object generationincluding location identification according to example embodiments ofthe present disclosure. FIG. 10 includes an illustration of a semanticprocessing system 1000 that can include one or more portions of one ormore systems (e.g., one or more computing systems) or devices (e.g., oneor more computing devices) including, the user device 102 shown in FIG.1, the remote computing device 104 shown in FIG. 1, and/or the computingdevice 200 shown in FIG. 2. Further, the detection, recognition, and/orprocessing of one or more portions of an environment by the semanticprocessing system 1000 can be implemented as an algorithm on thehardware components of one or more devices or systems (e.g., the userdevice 102, the remote computing device 104, and/or the computing device200) to, for example, generate one or more semantic objects and outputbased on one or more objects. As shown in FIG. 10, the semanticprocessing system 1000 includes a display component 1010, an object1020, an object 1022, an object 1024, and an object 1026.

In this example, a display component 1010 of the semantic processingsystem 1000 displays an environment that includes one or more objects(e.g., people, a building, a street, and vehicles) that is captured by acamera (not shown) of the semantic processing system 1000. The displaycomponent 1010 shows objects that have been detected and/or recognizedby the semantic processing system 1000, including the object 1020 thatis determined to be a street address; the object 1022 that is determinedto be signage associated with a service (a transportation service); theobject 1024 that is determined to be a face; and the object 1026 that isdetermined to be signage associated with a service (a restaurant).

The semantic processing system 1000 can generate semantic objects basedon the objects 1020, 1022, 1024, and/or 1026. For example, a semanticobject based on the object 1020 can be used to determine location (e.g.,location can be determined based on the street address when GPS serviceis unavailable); a semantic object based on the object 1022 can be usedto determine whether a delivery vehicle with a package for a user isnearby; and/or a semantic object based on the object 1026 can be used toidentify the restaurant associated with the object 1026 and provideinformation (e.g., ratings of food and service) to a user of thesemantic processing system 1000.

Further, a semantic object based on the object 1024 can be used todetermine whether a person (e.g., a friend of the user of the semanticprocessing system 1000) who has expressly given their permission torecognize their face to a user of the semantic processing system 1000 isnearby. In some embodiments, to safeguard the privacy of individualswhose images are captured by the semantic processing system 1000,personal identification data (e.g., facial recognition data) can bestored locally on the semantic processing system 1000 in a securedportion (e.g., an encrypted storage area) of the semantic processingsystem 1000 that is not shared with or accessible to any other devices.

The display component 1010 can be configured to receive one or moreinputs to interact with interface elements that are displayed on thedisplay component 1010. For example, based on a user can touching aportion of the display component 1010 that displays a recognized object,the semantic processing system 1000 can access information associatedwith a semantic object associated the recognized object.

FIG. 11 depicts an example of sensor based semantic object generationincluding location identification according to example embodiments ofthe present disclosure. FIG. 11 includes an illustration of a semanticprocessing system 1100 that can include one or more portions of one ormore systems (e.g., one or more computing systems) or devices (e.g., oneor more computing devices) including, the user device 102 shown in FIG.1, the remote computing device 104 shown in FIG. 1, and/or the computingdevice 200 shown in FIG. 2. Further, the detection, recognition, and/orprocessing of one or more portions of an environment by the semanticprocessing system 1100 can be implemented as an algorithm on thehardware components of one or more devices or systems (e.g., the userdevice 102, the remote computing device 104, and/or the computing device200) to, for example, generate one or more semantic objects and outputbased on one or more objects. As shown in FIG. 11, the semanticprocessing system 1000 includes a display component 1110 and an object1120.

In this example, a display component 1110 of the semantic processingsystem 1100 displays an environment captured by a camera (not shown) ofthe semantic processing system 1100. The display component 1110 displaysobjects that have been detected and/or recognized by the semanticprocessing system 1100 including the object 1120 which is determined tobe an entrance to a location to which a user is travelling. The semanticprocessing system can generate a semantic object based on the object1120 that can be used to provide navigational instructions to a user ofthe semantic processing system 1100. In some environments, entrances todifferent locations can be in close proximity to one another, and ageolocation signal (e.g., GPS) may not be available or may be tooinaccurate to distinguish between a correct entrance and an incorrectentrance. Accordingly, the semantic processing system 1100 can recognizethe correct entrance by generating a semantic object based on visualinput from the location and providing a user of the semantic processingsystem 1100 with directions based on the generated semantic object.

FIG. 12 depicts an example of sensor based semantic object generationincluding navigation according to example embodiments of the presentdisclosure. FIG. 12 includes an illustration of a semantic processingsystem 1200 that can include one or more portions of one or more systems(e.g., one or more computing systems) or devices (e.g., one or morecomputing devices) including, the user device 102 shown in FIG. 1, theremote computing device 104 shown in FIG. 1, and/or the computing device200 shown in FIG. 2. Further, the detection, recognition, and/orprocessing of one or more portions of an environment by the semanticprocessing system 1200 can be implemented as an algorithm on thehardware components of one or more devices or systems (e.g., the userdevice 102, the remote computing device 104, and/or the computing device200) to, for example, generate one or more semantic objects and outputbased on one or more objects. As shown in FIG. 12, the semanticprocessing system 1200 includes a display component 1210, a navigationindicator 1212, a status indicator 1214, a destination indicator 1216, astatus area 1220, and a status indicator 1222.

In this example, the semantic processing system 1200 includes a displaycomponent 1210 that displays one or more images and/or text. At the topof the display component 1210, a status area 1220 that can includevarious indicators including the status indicator 1222 to indicate thatthe semantic processing system 1200 is in a navigational mode. Thesemantic processing system 1200 can generate other indicators in varioussizes, shapes, and/or colors, including the status indicator 1214 thatis above the destination indicator 1216 that indicates the destinationthat a user of the semantic processing system 1200 is travelling to. Thedisplay component 1210 can also generate that includes text instructions“120 ft. Head West” and a graphical indicator (an arrow) that points inthe direction of the destination location. The semantic processingsystem 1200 can also include generate the navigation indicator 1212 thatincludes an identifier associated with the destination location “JoshuaTree National Park” that, in some embodiments, can receive one or moreinputs from a user to provide more information associated with thedestination location. In some embodiments, the status indicator 1214 canchange color, shape, and/or size when the destination location isarrived at.

FIG. 13 depicts an example of sensor based semantic object generationincluding location identification according to example embodiments ofthe present disclosure. FIG. 13 includes an illustration of a semanticprocessing system 1300 that can include one or more portions of one ormore systems (e.g., one or more computing systems) or devices (e.g., oneor more computing devices) including, the user device 102 shown in FIG.1, the remote computing device 104 shown in FIG. 1, and/or the computingdevice 200 shown in FIG. 2. Further, the detection, recognition, and/orprocessing of one or more portions of an environment by the semanticprocessing system 1300 can be implemented as an algorithm on thehardware components of one or more devices or systems (e.g., the userdevice 102, the remote computing device 104, and/or the computing device200) to, for example, generate one or more semantic objects and outputbased on one or more objects. As shown in FIG. 13, the semanticprocessing system 1300 includes a display component 1310, a status area1320, a status indicator 1322, an interface element 1324, and aninterface element 1326.

In this example, the semantic processing system 1300 includes a displaycomponent 1310 that includes a status area 1320 (e.g., a status bar)that can generate indicators of a status of the device or of semanticobjects that have been generated by the semantic processing system 1300in response to recognition of one or more states of one or more objectsin an environment. The status area 1320 can include a status indicator1322 that can indicate that the semantic processing system 1300 hasperformed recognition of the environment and has provided informationresulted with the environment. In this example, the semantic processingsystem 1300 provides an interface element 1324 that includes anindication of the location of the environment (“Hall of Music”) and alsoprovides the interface element 1326 that provide a user with differentways to interact with the semantic object associated with theenvironment. For example, a user of the semantic processing system 1300can touch the interface element 1326 to access information about theobject (e.g., ratings of the hall of music).

FIG. 14 depicts a flow diagram of an example method of sensor basedsemantic object generation according to example embodiments of thepresent disclosure. One or more portions of the method 1400 can beexecuted or implemented on one or more computing devices or computingsystems including, for example, the user device 102, the remotecomputing device 104, and/or the computing device 200. One or moreportions of the method 1400 can also be executed or implemented as analgorithm on the hardware components of the devices disclosed herein.FIG. 14 depicts steps performed in a particular order for purposes ofillustration and discussion. Those of ordinary skill in the art, usingthe disclosures provided herein, will understand that various steps ofany of the methods disclosed herein can be adapted, modified,rearranged, omitted, and/or expanded without deviating from the scope ofthe present disclosure.

At 1402, the method 1400 can include receiving data, including forexample, state data that is based in part on sensor output from one ormore sensors that detect a state of an environment including one or moreobjects including physical objects (e.g., entrances to buildings, streetaddresses, signage, and/or electronic devices).

The state data can include information associated with the state of theenvironment including one or more objects in the environment. The stateof the environment including the one or more objects can include atemporal state (e.g., the time of day when the sensor output associatedwith the state of the environment was output by the one or more sensors)that can also include one or more durations of events associated withthe environment (e.g., the duration of scheduled events); a locationstate associated with the location of the one or more objects in theenvironment (e.g., a latitude and longitude and/or a relative locationof the one or more objects to one another or to a point of referencelocation); and/or a physical state including one or more physicalcharacteristics (e.g., appearance including color and/or texture;physical dimensions including size, volume, mass, and/or weight; and/oraudio characteristics).

In some embodiments, the one or more sensors can include one or moreoptical sensors (e.g., one or more cameras); one or more periscopiccameras including one or more cameras that have a field of view thatexceeds one-hundred and eighty degrees; one or more audio sensors (e.g.,one or more microphones); one or more tactile sensors (e.g., surfacesthat can detect pressure or capacitance); one or more pressure sensorsincluding barometric sensors; one or more gyroscopic sensors; one ormore accelerometers including a configuration in which the one or moreaccelerometers can determine acceleration along any of three axes (e.g.,x axis, y axis, and z axis); one or more humidity sensors including oneor more sensors that can detect the level of moisture in the air; one ormore electromagnetic sensors; and/or one or more thermal sensors.

In some embodiments, the semantic processing system can include adisplay component (e.g., a liquid crystal display (LCD), an organiclight emitting diode (OLED), plasma display panel, electronic ink,and/or a cathode ray tube) that is configured to display one or moreimages that can include images of an environment that includes one ormore objects that are detected by one or more sensors. Further, in someembodiments the display component can include the one or more sensors(e.g., a touch screen) so that the display component can be used as aninput device.

Further, the one or more periscopic cameras can be configured orpositioned to capture the one or more images including the one or moreobjects or portions of the one or more objects that are not within avisual plane of the display component. For example, the one or moreperiscopic cameras can be positioned on any portion of the semanticcomputing system including a side facing a user holding the semanticcomputing system (e.g., on the same side as a display component), a sidefacing away from a user holding the semantic computing system (e.g., aside opposite a display component), and/or any of the edges of thedevice.

The display component of the semantic computing system can include avisual plane which can include a plane that if it were an optical sensorwould capture one or more images within a range of less than one hundredand eighty degrees of a portion of the optical sensor (e.g., imagesperpendicular to or behind the visual plane would not be captured). Forexample, if the semantic processing device is in the shape of arectangular cuboid, the one or more periscopic cameras can be located onany of the sides of the cuboid.

At 1404, the method 1400 can include generating one or more semanticobjects corresponding to the one or more objects. The one or moresemantic objects can be generated, for example, based in part on dataincluding the state data and/or an object recognition model including amachine learned model.

The semantic processing system can analyze the state data and performone or more operations on the state data including comparing the statedata to information that is associated with one or more portions of thestate data. For example, the appearance of the one or more objects canbe compared to a database of objects that can be used to identify theone or more objects. Based on the identification of the one or moreobjects, the semantic processing system can generate further informationincluding attributes of the one or more objects. In another example, thestate data can include a location and time which can be used todetermine, based on a comparison to a database of events, whether one ofthe events in the database will occur within a given location at a timeperiod that the user of the device will be present in the location.

In some embodiments, the semantic processing system can access a machinelearned model (e.g., access a machine learned model that has been storedlocally and/or a machine learned model that is stored on a remotecomputing device) that has been created using a classification datasetincluding classifier data that includes a set of classified features anda set of classified object labels associated with training data that canbe based on, or associated with, a plurality of training objects (e.g.,physical objects or simulated objects that are used as training inputsfor the machine learned model). The classification dataset can be basedin part on inputs from one or more sensors (e.g., cameras and/ormicrophones) that have been used to generate visual outputs and audiooutputs based on the visual inputs and the audio inputs respectively.For example, the machine learned model can be created using a set ofcameras and microphones that captured training data including video andaudio of an urban area that includes various objects includingwaterbodies, waterways, buildings (e.g., houses and/or hotels), streets,alleyways, vehicles (e.g., automobiles and/or trams), people, and/orsurfaces with text (e.g., movie posters).

The one or more semantic objects can include a set of attributes (e.g.,a set of attributes for each of the one or more semantic objects). Forexample, the set of attributes associated with the one or more semanticobjects can include one or more object identities including the identityof the one or more objects associated with the one or more semanticobjects (e.g., the manufacturer and model of an automobile); one or moreobject types associated with the type, category, or class of the one ormore objects associated with the one or more semantic objects (e.g., anautomobile can be associated with a vehicle type); an object locationincluding a geographic location associated with the one or more objectsassociated with the one or more semantic objects (e.g., an address of abuilding object); a monetary value (e.g., one or more prices associatedwith an object); an ownership status including the owner of an object(e.g., the owner of a house); and/or a set of physical characteristics(e.g., a size, appearance, and/or mass associated with an object).

At 1406, the method 1400 can include determining, based in part on theset of attributes of the one or more semantic objects, one or moreoperating modes associated with the one or more semantic objects. Theone or more operating modes can determine the way in which the one ormore semantic objects are processed and/or used by the semanticprocessing system. As such, the semantic processing system canselectively dedicate computing resources to a subset of possibleoperations based on the one or more attributes of the one or moresemantic objects (e.g., detecting signage that includes text can resultin a determination that a text recognition mode will be used to processthe one or more semantic objects associated with the signage).

The one or more operating modes can include a text recognition modeassociated with recognizing textual information in the environment(e.g., recognizing when an object contains text or pictograms); alocation recognition mode associated with recognizing one or morelocations in the environment (e.g., locating an entrance to arestaurant); an object recognition mode associated with recognizing theone or more objects in the environment (e.g., recognizing an automobilein a parking lot); and/or an event recognition mode associated withrecognizing an occurrence of one or more events in the environment(e.g., associating a time and location with a scheduled event).

At 1408, the method 1400 can include determining one or more relevancevalues corresponding to the one or more semantic objects. The one ormore relevance values can be based in part on an extent to which each ofthe one or more semantic objects is associated with context data. Thecontext data can include various characteristics associated with theenvironment including data associated with a time of day, a currentlocation (e.g., a geographical location and/or address associated withthe environment); one or more scheduled events (e.g., one or more eventsthat will occur within a predetermined period of time), one or more userlocations, or one or more user preferences (e.g., one or morepreferences of a user including restaurant preferences, literaturepreferences, and/or beverage preferences). In some embodiments, the oneor more object outputs can be based in part on the one or more relevancevalues that correspond to the one or more semantic objects.

At 1410, the method 1400 can include generating, based in part on theone or more operating modes, one or more object outputs associated withthe one or more semantic objects. The one or more object outputs caninclude one or more outputs via one or more output devices of thesemantic processing system (e.g., one or more display devices, audiodevices, and/or haptic output devices). The text recognition mode canproduce one or more object outputs that include text related outputincluding translations of text that is recognized (e.g., generatingRussian text based on detection and translation of an English text).

In some embodiments, the one or more object outputs can include one ormore visual indications (e.g., one or more visual images produced by adisplay device of the semantic processing system) and/or one or moreaudio indications (e.g., one or more sounds produced by an audio outputdevice of the semantic processing system). For example, the one or moreobject outputs can include a translation displayed on a display device,audio indications that include an audio version of a written text (e.g.,text to speech), and/or one or more images that are superimposed oncamera imagery of an environment.

At 1412, the method 1400 can include modifying, based in part on thestate data or the semantic data, the one or more visual indications orthe one or more audio indications. Modifying the one or more visualindications or the one or more audio indications can includetransforming the one or more visual indications into one or moremodified audio indications (e.g., generating artificial speech based ondetected text); transforming the one or more audio indications into oneor more modified visual indications (e.g., generating text based onaudio inputs to a microphone); modifying a size of the one or morevisual indications (e.g., increasing the size of an object captured by acamera); modifying one or more color characteristics of the one or morevisual indications (e.g., brightening the one or more visualindications); and/or modifying an amplitude of the one or more audioindications (e.g., increasing the volume of one or more audioindications). Such modifications of the one or more visual indicationsand/or the one or more audio indications can be used to enhance anyuser's experience and can be particularly useful for individuals withvisual or hearing impairments. For example, the semantic processingsystem can enhance the volume of sounds that would otherwise beinaudible for an individual with a hearing impairment.

FIG. 15 depicts a flow diagram of an example method of sensor basedsemantic object generation according to example embodiments of thepresent disclosure. One or more portions of the method 1500 can beexecuted or implemented on one or more computing devices or computingsystems including, for example, the user device 102, the remotecomputing device 104, and/or the computing device 200. One or moreportions of the method 1500 can also be executed or implemented as analgorithm on the hardware components of the devices disclosed herein.FIG. 15 depicts steps performed in a particular order for purposes ofillustration and discussion. Those of ordinary skill in the art, usingthe disclosures provided herein, will understand that various steps ofany of the methods disclosed herein can be adapted, modified,rearranged, omitted, and/or expanded without deviating from the scope ofthe present disclosure.

At 1502, the method 1500 can include determining, based in part on theset of attributes (e.g., the set of attributes in the method 1400) ofthe one or more semantic objects (e.g., the one or more semantic objectsin the method 1400), object data that matches the one or more semanticobjects. For example, the semantic processing system can match the setof attributes to the object data based on one or more comparisonsbetween portions of the set of attributes and the object data. Theobject data can include information associated with one or more relatedobjects (e.g., a semantic object for a hat can be associated with otherarticles of clothing); one or more remote data sources (e.g., a semanticobject for a song can be associated with a website associated with thesinger of the song); one or more locations; and/or one or more events.

At 1504, the method 1500 can include accessing one or more portions ofthe object data that matches the one or more semantic objects. Forexample, the semantic processing system can access one or more portionsof the object data that are stored on one or more remote computingdevices. In some embodiments, the one or more object outputs can bebased in part on the one or more portions of the object data thatmatches the one or more semantic objects. For example, when the objectdata includes links to one or more remote computing devices that areassociated with the one or more semantic objects, the one or more objectoutputs can include those links.

At 1506 the method 1500 can include generating, based in part on thestate data or the one or more semantic objects, one or more interfaceelements associated with the one or more objects. The one or moreinterface elements can include one or more images (e.g., graphical userinterface elements including still or animated pictures, pictograms,and/or text) responsive to one or more inputs (e.g., the one or moreinterface elements can initiate or trigger one or more operations basedon a haptic input and/or an audio input). For example, the one or moreinterface elements can include a status indicator (e.g., a status bardisplayed on a display component of the semantic processing system) thatcan provide one or more incremental (e.g., every minute, every hour,and/or every day) and/or continuous (e.g., real-time) indications ofassociated with the state of the one or more objects (e.g., the locationand/or closing time of a restaurant).

In some embodiments recognition of the one or more objects can beperformed as a continuous process (e.g., continuous recognition of theone or more objects) so that the one or more objects (e.g., sensoroutput including visual and/or audio sensor output associated with theone or more objects that) can be detected, identified, and/or recognizedin real time and the one or more interface elements including the statusindicator can also be updated continuously (e.g., as the one or moreobjects are recognized in real time). Further, the one or more interfaceelements can be used to provide navigational instructions (e.g., textualor audio instructions associated with a path to a location) and otherinformation related to the one or more objects in the environment.

At 1508, the method 1500 can include determining whether, when, or that,one or more inputs are received by the semantic processing system. Theone or more inputs can include one or more inputs from a user of thesemantic processing system including one or more visual inputs (e.g.,waving a hand or blinking in front of a camera of the semanticprocessing system); one or more audio inputs (e.g., speaking a commandinto a microphone of the semantic processing system); and/or one or morehaptic inputs (e.g., touching a portion of a display component of thesemantic processing system). Further, the one or more inputs can includeone or more inputs to a device associated with the semantic processingsystem including a computing device and/or an input device (e.g., astylus and/or a mouse).

In response to receiving the one or more inputs, the method 1500proceeds to 1510. In response to not receiving the one or more inputs,the method can end or return to a previous part of the method 1500including 1502, 1504, or 1506.

At 1510, the method 1500 can include, in response to receiving one ormore inputs to the one or more interface elements, determining one ormore remote computing devices that include at least a portion of theobject data (e.g., one or more remote computing devices that store somepart of the object data). The one or more object outputs can include oneor more remote source indications associated with the one or more remotecomputing devices that comprise at least a portion of the object data(e.g., IP addresses associated with the one or more remote computingdevices).

FIG. 16 depicts a flow diagram of an example method of sensor basedsemantic object generation according to example embodiments of thepresent disclosure. One or more portions of the method 1600 can beexecuted or implemented on one or more computing devices or computingsystems including, for example, the user device 102, the remotecomputing device 104, and/or the computing device 200. One or moreportions of the method 1600 can also be executed or implemented as analgorithm on the hardware components of the devices disclosed herein.FIG. 16 depicts steps performed in a particular order for purposes ofillustration and discussion. Those of ordinary skill in the art, usingthe disclosures provided herein, will understand that various steps ofany of the methods disclosed herein can be adapted, modified,rearranged, omitted, and/or expanded without deviating from the scope ofthe present disclosure.

At 1602, the method 1600 can include determining, based in part on thestate data (e.g., the state data in the method 1400) or the one or moresemantic objects (e.g., the one or more semantic objects in the method1400), the one or more objects (e.g., the one or more objects in themethod 1400) that comprise one or more semantic symbols (e.g., one ormore graphemes including one or more letters, one or more logograms, oneor more syllabic characters and/or one or more pictograms).

At 1604, the method 1600 can include determining, based in part on theone or more semantic symbols, one or more words associated with the oneor more semantic symbols (e.g., using a list of words, certaincombinations of the one or more semantic symbols can be associated withwords). In some embodiments, the set of attributes (e.g., the set ofattributes in the method 1400) of the one or more semantic objects caninclude the one or more words. For example, the semantic object for aposter with text indicating “Winter palace restaurant grand opening onAugust 24” can include a poster semantic object that includes a set ofattributes that includes restaurant opening as the value for an eventtype attribute, August 24 as the value for an event date attribute, anda geographic coordinate associated with the Winter palace restaurant, asthe value for the location attribute.

At 1606, the method 1600 can include determining a detected languagethat is associated with the one or more semantic symbols. For example,based in part on the combinations of the one or more semantic symbols(e.g., words associated with the one or more semantic symbols), thesemantic processing system can determine the language (e.g., a languageincluding Spanish, English, Russian, and/or Japanese) that is associatedwith the one or more semantic symbols.

At 1608, the method 1600 can include generating, based in part ontranslation data, a translated output when the detected language is notassociated with a default language (e.g., a language that a user of thesemantic processing system has selected as being the language into whichthe detected language is translated when the detected language is notthe same as the default language). The translation data can include oneor more semantic symbols in the default language and one or moresemantic symbols in the detected language. The semantic processingsystem can compare the one or more semantic symbols in the detectedlanguage to the one or more semantic symbols in the default language todetermine and perform an analysis to translate the detected language.

The translated output can include the one or more semantic symbols inthe default language that correspond to a portion of the one or moresemantic symbols in the detected language (e.g., a multi-languagedictionary that includes a listing of one or more words in the defaultlanguage, each of which is associated with the corresponding word in thedetected language). In some embodiments, the one or more object outputscan be based in part on the translated output (e.g., the one or moreobject outputs can include a visual indication or an audio indication ofthe translation).

FIG. 17 depicts a flow diagram of an example method of sensor basedsemantic object generation according to example embodiments of thepresent disclosure. One or more portions of the method 1700 can beexecuted or implemented on one or more computing devices or computingsystems including, for example, the user device 102, the remotecomputing device 104, and/or the computing device 200. One or moreportions of the method 1700 can also be executed or implemented as analgorithm on the hardware components of the devices disclosed herein.FIG. 17 depicts steps performed in a particular order for purposes ofillustration and discussion. Those of ordinary skill in the art, usingthe disclosures provided herein, will understand that various steps ofany of the methods disclosed herein can be adapted, modified,rearranged, omitted, and/or expanded without deviating from the scope ofthe present disclosure.

At 1702, the method 1700 can include receiving data, including locationdata that includes information associated with a current location of theenvironment (e.g., a latitude and longitude of the current location) anda destination location (e.g., a destination location including anaddress and/or a latitude and latitude selected by a user of thesemantic processing system). In some embodiments, the location data caninclude a relative location (e.g., the current location is south-west ofa user's place of business).

At 1704, the method 1700 can include determining, based in part on thelocation data and the state of one or more objects (e.g., the one ormore objects in the method 1400) within a field of view of the one ormore sensors, a path from the current location to the destinationlocation (e.g., a path between the current location and the destinationlocation that avoids intervening obstacles). For example, the semanticprocessing system can determine a shortest path from the currentlocation to the destination location that does not go through anyobstacles (e.g., a river or construction zone).

At 1706, the method 1700 can include generating one or more directions(e.g., a series of steps based on locations along the path or one ormore general directions to travel in a compass direction for a period oftime) based in part on the one or more semantic objects and the pathfrom the current location to the destination location. Further, thesemantic processing system can determine one or more semantic objectsthat can be used as landmarks associated with the one or more directions(e.g., a semantic object associated with a restaurant can be used aspart of the one or more directions “turn left at the Winter palacerestaurant one block ahead”). In some embodiments, the one or moreobject outputs can be based in part on the one or more directions (e.g.,the one or more visual indications or the one or more audio indicationscan include directions).

The technology discussed herein makes reference to servers, databases,software applications, and other computer-based systems, as well asactions taken and information sent to and from such systems. One ofordinary skill in the art will recognize that the inherent flexibilityof computer-based systems allows for a great variety of possibleconfigurations, combinations, and divisions of tasks and functionalitybetween and among components. For instance, server processes discussedherein may be implemented using a single server or multiple serversworking in combination. Databases and applications may be implemented ona single system or distributed across multiple systems. Distributedcomponents may operate sequentially or in parallel.

While the present subject matter has been described in detail withrespect to specific example embodiments thereof, it will be appreciatedthat those skilled in the art, upon attaining an understanding of theforegoing may readily produce alterations to, variations of, andequivalents to such embodiments. Accordingly, the scope of the presentdisclosure is by way of example rather than by way of limitation, andthe subject disclosure does not preclude inclusion of suchmodifications, variations and/or additions to the present subject matteras would be readily apparent to one of ordinary skill in the art.

What is claimed is:
 1. A computer-implemented method of objectrecognition, the method comprising: receiving, by a computing systemcomprising one or more computing devices, state data based in part onsensor output from one or more sensors that detect a state of anenvironment including one or more objects; generating, by the computingsystem, based in part on the state data and a machine-learned model, oneor more semantic objects corresponding to the one or more objects,wherein the machine-learned model is configured to recognize the one ormore objects, and wherein the one or more semantic objects comprise aset of attributes associated with one or more words; determining, by thecomputing system, based in part on the set of attributes associated withthe one or more words, one or more operating modes comprising a textrecognition mode associated with recognizing textual information in theenvironment and associating the textual information with a time and alocation of an event; and generating, by the computing system, based inpart on the one or more operating modes comprising the text recognitionmode, one or more object outputs associated with the one or moresemantic objects, wherein the one or more object outputs comprise one ormore visual indications or one or more audio indications associated withthe time and the location of the event.
 2. The computer-implementedmethod of claim 1, wherein the computing system comprises a displaycomponent configured to display one or more images comprising images ofthe environment including the one or more objects that are detected bythe one or more sensors.
 3. The computer-implemented method of claim 2,wherein the one or more sensors comprise one or more periscopic camerasthat are positioned to capture the one or more images including the oneor more objects or portions of the one or more objects that are notwithin a visual plane of the display component.
 4. Thecomputer-implemented method of claim 1, further comprising: determining,by the computing system, based in part on the set of attributes of theone or more semantic objects, object data that matches the one or moresemantic objects, wherein the object data comprises informationassociated with one or more related objects, one or more remote datasources, one or more locations, or one or more events; and accessing, bythe computing system, one or more portions of the object data thatmatches the one or more semantic objects, wherein the one or more objectoutputs are based in part on the one or more portions of the object datathat matches the one or more semantic objects.
 5. Thecomputer-implemented method of claim 4, further comprising: generating,by the computing system, based in part on the state data or the one ormore semantic objects, one or more interface elements associated withthe one or more objects, wherein the one or more interface elementscomprise one or more images responsive to one or more inputs; andresponsive to receiving the one or more inputs to the one or moreinterface elements, determining, by the computing system, one or moreremote computing devices that comprise at least a portion of the objectdata, wherein the one or more object outputs comprise one or more remotesource indications associated with the one or more remote computingdevices that comprise at least a portion of the object data.
 6. Thecomputer-implemented method of claim 1, wherein the one or moreoperating modes comprise a location recognition mode associated withrecognizing one or more locations in the environment, an objectrecognition mode associated with recognizing the one or more objects inthe environment, or an event recognition mode associated withrecognizing an occurrence of one or more events in the environment. 7.The computer-implemented method of claim 1, further comprising:determining, by the computing system, based in part on the state data orthe one or more semantic objects, the one or more objects that compriseone or more semantic symbols, wherein the one or more semantic symbolscomprise one or more letters, one or more logograms, one or moresyllabic characters, or one or more pictograms; and determining, by thecomputing system, based in part on the one or more semantic symbols, oneor more words associated with the one or more semantic symbols, whereinthe set of attributes of the one or more semantic objects comprises theone or more words.
 8. The computer-implemented method of claim 7,further comprising: determining, by the computing system, a detectedlanguage associated with the one or more words; and generating, by thecomputing system, based in part on translation data, translated outputwhen the detected language is not associated with a default language,the translation data comprising one or more words in the defaultlanguage and one or more words in the detected language, the translatedoutput comprising the one or more words in the default language thatcorrespond to a portion of the one or more words in the detectedlanguage, wherein the one or more object outputs are based in part onthe translated output.
 9. The computer-implemented method of claim 1,further comprising: receiving location data comprising informationassociated with a current location of the environment and a destinationlocation; determining, by the computing system, based in part on thelocation data and the state of the environment comprising the one ormore objects within a field of view of the one or more sensors, a pathfrom the current location to the destination location; and generating,by the computing system, one or more directions based in part on the oneor more semantic objects and the path from the current location to thedestination location, wherein the one or more object outputs are basedin part on the one or more directions.
 10. The computer-implementedmethod of claim 1, further comprising: determining, by the computingsystem, based in part on an extent to which each of the one or moresemantic objects is associated with context data, one or more relevancevalues corresponding to the one or more semantic objects, the contextdata comprising data associated with a time of day, a current location,one or more scheduled events, one or more user locations, or one or moreuser preferences, wherein the one or more object outputs are based inpart on the one or more relevance values that correspond to the one ormore semantic objects.
 11. The computer-implemented method of claim 1,further comprising: modifying, by the computing system, based in part onthe state data or the semantic data, the one or more visual indicationsor the one or more audio indications, wherein the modifying comprisestransforming the one or more visual indications into one or moremodified audio indications, transforming the one or more audioindications into one or more modified visual indications, modifying asize of the one or more visual indications, modifying one or more colorcharacteristics of the one or more visual indications, or modifying anamplitude of the one or more audio indications.
 12. Thecomputer-implemented method of claim 1, wherein the set of attributesassociated with the one or more semantic objects comprises one or moreobject identities, one or more object types, an object location, amonetary value, an ownership status, a stock keeping unit, or a set ofphysical characteristics.
 13. One or more tangible, non-transitorycomputer-readable media storing computer-readable instructions that whenexecuted by one or more processors cause the one or more processors toperform operations, the operations comprising: receiving state databased in part on sensor output from one or more sensors that detect astate of an environment including one or more objects; generating, basedin part on the state data and a machine-learned model, one or moresemantic objects corresponding to the one or more objects, wherein themachine-learned model is configured to recognize the one or moreobjects, and wherein the one or more semantic objects comprise a set ofattributes associated with one or more words; determining, based in parton the set of attributes associated with the one or more words, one ormore operating modes comprising a text recognition mode associatedrecognizing textual information in the environment and associating thetextual information with a time and a location of an event; andgenerating, based in part on the one or more operating modes comprisingthe text recognition mode, one or more object outputs associated withthe one or more semantic objects, wherein the one or more object outputscomprise one or more visual indications or one or more audio indicationsassociated with the time and the location of the event.
 14. The one ormore tangible, non-transitory computer-readable media of claim 13,further comprising: determining, based in part on the set of attributesof the one or more semantic objects, object data that matches the one ormore semantic objects, wherein the object data comprises informationassociated with one or more related objects, one or more remote datasources, one or more locations, or one or more events; and accessing oneor more portions of the object data that matches the one or moresemantic objects, wherein the one or more object outputs are based inpart on the one or more portions of the object data that matches the oneor more semantic objects.
 15. The one or more tangible, non-transitorycomputer-readable media of claim 14, further comprising: generating,based in part on the state data or the one or more semantic objects, oneor more interface elements associated with the one or more objects,wherein the one or more interface elements comprise one or more imagesresponsive to one or more inputs; and responsive to receiving the one ormore inputs to the one or more interface elements, determining, one ormore remote computing devices that comprise at least a portion of theobject data, wherein the one or more object outputs comprise one or moreremote source indications associated with the one or more remotecomputing devices that comprise at least a portion of the object data.16. The one or more tangible, non-transitory computer-readable media ofclaim 13, further comprising: modifying, based in part on the state dataor the semantic data, the one or more visual indications or the one ormore audio indications, wherein the modifying comprises transforming theone or more visual indications into one or more modified audioindications, transforming the one or more audio indications into one ormore modified visual indications, modifying a size of the one or morevisual indications, or modifying an amplitude of the one or more audioindications.
 17. A computing system comprising: one or more processors;one or more non-transitory computer-readable media storing instructionsthat when executed by the one or more processors cause the one or moreprocessors to perform operations comprising: receiving state data basedin part on sensor output from one or more sensors that detect a state ofan environment including one or more objects; generating, based in parton the state data and a machine-learned model, one or more semanticobjects corresponding to the one or more objects, wherein themachine-learned model is configured to recognize the one or moreobjects, and wherein the one or more semantic objects comprise a set ofattributes associated with one or more words; determining, based in parton the set of attributes associated with the one or more words, one ormore operating modes comprising a text recognition mode associatedrecognizing textual information in the environment and associating thetextual information with a time and a location of an event; andgenerating, based in part on the one or more operating modes comprisingthe text recognition mode, one or more object outputs associated withthe one or more semantic objects, wherein the one or more object outputscomprise one or more visual indications or one or more audio indicationsassociated with the time and the location of the event.
 18. Thecomputing system of claim 17, further comprising: determining, based inpart on the set of attributes of the one or more semantic objects,object data that matches the one or more semantic objects, wherein theobject data comprises information associated with one or more relatedobjects, one or more remote data sources, one or more locations, or oneor more events; and accessing one or more portions of the object datathat matches the one or more semantic objects, wherein the one or moreobject outputs are based in part on the one or more portions of theobject data that matches the one or more semantic objects.
 19. Thecomputing system of claim 18, further comprising: generating, based inpart on the state data or the one or more semantic objects, one or moreinterface elements associated with the one or more objects, wherein theone or more interface elements comprise one or more images responsive toone or more inputs; and responsive to receiving the one or more inputsto the one or more interface elements, determining, one or more remotecomputing devices that comprise at least a portion of the object data,wherein the one or more object outputs comprise one or more remotesource indications associated with the one or more remote computingdevices that comprise at least a portion of the object data.
 20. Thecomputing system of claim 17, further comprising: modifying, based inpart on the state data or the semantic data, the one or more visualindications or the one or more audio indications, wherein the modifyingcomprises transforming the one or more visual indications into one ormore modified audio indications, transforming the one or more audioindications into one or more modified visual indications, modifying asize of the one or more visual indications, or modifying an amplitude ofthe one or more audio indications.